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A systolic Viterbi decoder for convolutional codes is developed. This decoder 
uses the trace-back method to reduce the amount of data needed to be stored in 
registers. It is shown that this new algorithm requires a smaller chip size and 
achieves a faster decoding time than other existing methods. 


I. Introduction 

Convolutional coding with Viterbi decoding [1] is a 
powerful method for forward error correction. As a con- 
sequence there is a growing need for implementing the 
Viterbi decoder in VLSI in deep-space communication [2]. 

There are two classes of algorithms established for 
realizing the Viterbi decoder. These are the register- 
exchange and trace-back methods [3]. Unfortunately, each 
class has drawbacks. Both, require a substantial amount 
of storage space and hardware for even a moderate speed 
while the latter, for long constraint lengths, also needs a 
long decoding time. 

In this article, the Viterbi decoding algorithm is first 
presented for comparison with other methods. For this 
example, a modified trace-back algorithm is created and 
shown to require a minimal number of storage devices and 
a short decoding time. Next, a systolic architecture is de- 
veloped for this modified trace-back decoding algorithm. It 
is demonstrated for this new trace-back architecture that 
the tracking, updating, and storing of the hypothesized in- 


formation sequences for the Viterbi decoder can be accom- 
plished simultaneously during a single clock cycle. Finally, 
a suitable VLSI implementation is suggested for this new 
systolic architecture. 

II. Example of a Viterbi Decoder for a (3,1/2) 
Convolutional Code 

Let K and R be the constraint length and the rate, re- 
spectively, of what is denoted a (I < , R) convolutional code 
(CC). In this section, the nature of the Viterbi decoding 
algorithm is demonstrated by using a (3,1/2) convolutional 
code. 

Consider the example of a (3,1/2) CC with the gen- 
erator polynomial G(x) = (x 2 + x + l,x 2 4* 1)* The en- 
coder for this code is shown in Fig. 1 where one observes 
that the encoder is composed of a two-stage shift register 
A — (Ai,Ao) with three modulo-2 adders and a multi- 
plexer for converting a parallel to a serial output, where A, 
for (i = 1, 2) denotes a one-bit register. Let the information 
sequence be u = (tti, tt 2 > u 3 > w 4i * » •) = (0, 0, 1,0,...). Af- 
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ter encoding the information sequence u — (0, 0,1,0,.. .), 
the output code word is given by v = (tq , v 2 , . . .) = 

( 00 , 00 , 11 , 10 ,...). 

The trellis diagram for a particular (3,1/2) CC is 
shown in Fig. 2. In each column pf the trellis there are 
2 k = 2 2 = 4 states of the shift register. These are the four 
possible states of the shift register of the encoder. Each in- 
formation bit causes the shift register to change state. This 
is represented by a branch from the present node to the 
next node. Each branch in the j th column is labeled with 
the single output code- word frame vj. The upper branch 
leaving a node at time unit j — 1 represents the “zero” 
input bit uj = 0, while the lower branch represents the 
“one” input bit uj = 1. The code word that corresponds 
to the information sequence u = (0,0, 1, 0, 1, 0, 0, 0, . , .) is 
shown in Fig. 2 as a heavy-line path. 

Assume that a code word v = (tq, i> 2 , v 3 > « • ) is trans- 
mitted over a binary symmetric channel (BSC) and that 
the received sequence is r = (ri, r 2 , r 3 , . . .). The branch 
metric from state S/ at time unit j — 1 to S* at time unit 
j is defined by 

d 3- = II r i -Vj\\ (1) 

where djj_i(/,fc) denotes Hamming path distance from 
state Si at time unit j — 1 to Sk at time unit j. Here also 
II r j - v jll is the Hamming weight of the difference of the 
two binary vectors rj and vj. The partial path metric for 
a state Sk up until the end of the first j branches of a path 
is denoted by If there is a state 5/ at time j — l 

which can change to state Sk at time j along the given 
path, then the partial path metric is expressible by the 
difference equation 

Mj(k) = + djj_ x (l,k) (2) 

where M 0 (k) = 0 for k = 0. The smallest metric for all 
paths terminating at state Sk at time j is given by 

Pj(k) = min Mj (k) (3) 

where the minimum is over all possible partial-path metrics 
that end at state Sk at time j. Pj{k) is the weight of what 
is called the survivor path to state Sk at time unit j. 

The Viterbi decoder for a convolutional code finds the 
survivor path, the path of minimum metric, which reaches 
a given state at time j. This survivor path is dependent 
on the input information bits so that the decoded bits are 
read off easily along this path. 


The Viterbi algorithm is outlined as follows: 

(1) Beginning at time unit j = 1, compute the par- 
tial metric for the single path entering each state. 
Store the path (survivor) and its metric for each 
state. 

(2) Increase j by 1. Compute the partial metric 
for all the paths entering a state by adding the 
branch metric entering that state to the metric 
of the connecting survivor at the preceding time 
unit. For each state, store the path with the 
smallest metric (the survivor), together with the 
metric, and eliminate all other paths. 

(3) If j < L CJ repeat step (2). Otherwise, stop. 

In the above, L c is the length of the_ code word. The 
final survivor with the smallest metric can be used to de- 
code the information bit along this path. 

As an example of the above Viterbi algorithm, assume 
that a code word of this (3, 1/2) CC is v = (tq , t/ 2 , v 3 , . . .) = 
(00, 00, 11, 10, 00, 10, 11, 00, 11, 01, 01, 00, 10, 11, 11, 

10, 11, 11, 01, 01) and this code word is transmitted over 
a BSC. The received sequence is r = (rq, r 2 , r 3 , . . .) = (00, 

01, 11, 10, 10, 10, 11, 00, 11, 11, 01, 00, 10, 10, 11, io[ 

11, 11, 01, 01). The Viterbi decoder for this convolutional 
code is illustrated in Fig. 3. At the beginning of the trellis 
diagram, one observes from Fig. 3 that only a single path 
enters each state. For example, states Sq and 5 2 follow 
from state S 0 during the first time unit. By step (2), the 
partial path metric for state So at time unit 1 is M\(0) = 
^o p i(0,0) =|| ri — iq ||= 0. Thus the survivor path has 
metric Pi(0) = Afi(0) ~ 0. Similarly, one obtains the 
partial metric for state «S 2 as Pj(2) = M\( 2) = 2. 

At the second time unit, only single paths enter states 
Sq } 5 2 , Si, and S3. Thus the weights of survivors at j = 2 
are found from P 2 ( 0) = A/ 2 (0) = Afi(O) -f d x 2 (0,0) = 1, 
P 2 (2) = M 2 ( 2) = Mi(0) + <M0,2) = i, P 2 (l) = 
M 2 (l) = A/, (2) + di >2 (2, 1) = 4, P 2 (3) = M 2 (3) = 
A/^2 ) + ^i i 2 (2, 3) = 2. 

__ At time unit 3, there are two branches entering each 
state. One of these two paths entering state So is obtained 
by the algorithm using the following results: First Af 3 (0) = 
M 2 (0) Hh d 3 (0,0) = 3 or A/ 3 (0) = Af 2 (l) + d 3 ( 1,0) = 4; 
hence, P 3 (0) = min {3,4} = 3 for state So; similarly, one 
yields P 3 (2) = 1 for state S 2? p3(l) = 2 for state Si, and 
P 3 (2) = 2 for state S 3 . 

For the remaining frame times, the same procedure 
yields the survivor path segments from each partial path 
metric for each state. In Fig. 3, the metrics of the survivors 
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are denoted at each node. Assume that input information 
bits stop at time unit 20 and that one chooses the smallest 
metric among the four nodes. Thus, at time unit 20 , the 
smallest metric among survivors is P 20 (l) = 4 . This sur- 
vivor path that reaches state Si at time unit 20 is heavy 
lined in Fig. 3 . For this path, the decoded information 
bits are (0,0, 1,0, 1,0, 0,0, 1, 1,0, 1,0,0, 1,0,0, 1, 1,0). 

Define Wi to be a window of length L at time unit i, 
where the times of the start and end of the window are i 
and i + jj __ 1 j respectively. It is shown in [ 4 ] that if the 
length of the information bit stream is large, the decoding- 
window length L is usually several times the constraint 
length. At time unit L, the decoder chooses the surviving 
path that reaches the state with the smallest survivor by 
the first decoding window W \ . The first branch in W\ 
can be decoded as the first information bit. This decoding 
window then shifts one time unit to be the next decoding 
window W 2 . This new decoding window can be used to 
decode the second decoded information bit. The following 
section discusses two different methods for realizing the 
Viterbi decoder for a decoding window of length L = 5 K. 


III. Methods for Realizing the Viterbi 
Decoder 

There are two methods for appoximately realizing the 
Viterbi decoder [ 3 ]. The first method, called the register- 
exchange method, calls for all paths to be stored for each 
of the 2 k states. At each time unit, a new branch is pro- 
cessed by comparing the partial path metrics. Then cer- 
tain registers are interchanged corresponding to the paths 
that survived the comparison, and a new information bit 
is added at one end of each register. After 5 1 < branches 
have been processed, the first bit of register, corresponding 
to the smallest survivor, is shifted out as the first decoded 
bit. The register exchange algorithm is illustrated in [ 5 ]. 
This algorithm for long constraint-length code requires a 
substantial amount of storage space and hardware for even 
moderate decoding speeds. 

The second method, called the trace-back method, 
does not store the actual information sequence but instead 
stores the results of each comparison. After 5 /i branches 
have been processed, the trellis connections are recalled 
in reverse order. That path traced back through the trel- 
lis diagram is used to decode the first bit. The following 
constitutes the trace-back algorithm. 

To trace back a survivor path through the trellis di- 
agram, one needs to store the trellis connections for each 
state at each time unit. For example, the survivor paths 
from time unit 3 to time unit 4 in Fig. 3 are shown in detail 


in Fig. 4 (a). At time unit 3 , the state transitions to state 

50, S 2 , Si, and S$ at time unit 4 are the states Si, Si, S 2 , 
and S 3 , respectively. In Fig. 4 (a), if one needs to trace 
the path from S 2 at time unit 4 back to S\ at time unit 
3, then one needs the information that a state transition 
from Si at time unit 3 to state S 2 occurred at time unit 4 . 
Note that the last bit of state vector Si , i.e., the last bit of 
01 , shifts out of the shift register A prior to the time that 
state Si changes to S 2 . Thus this last bit of state vector 
Si needs to be stored in order to trace state S 2 back to 

51. Let yj(k) be this one-bit information needed to trace 
state Sjk at time unit j back to the state of the survivor at 
time unit j — 1 . By this definition, jm( 2 ) = T As shown 
in Fig. 4(a), the values y 4 ( 0 ), 2/4(2), y 4 ( 1 ), and y 4 ( 3 ) are 
obtained as 1, 1, 0, and 1, respectively. 

To trace a state vector back to its previous state vec- 
tor survivors, one notes first by the example in Fig. 4 (a) 
that the current state S 2 at time unit 4 is the result of 
shifting a 1 into register A . This bit is the first bit of 
state vector 10 of S 2 and should be deleted when traced 
back. Also the y 4 ( 2 ) should be linked to the last bit of this 
state vector. Thus one can determine the previous state 
vector to be 01 by deleting the first bit of state vector 10 
of S 2 and concatenating with the t/4(2), which is bit ‘ 1 . 
Every previous state of states So, S 2 , Si, and S3 can be 
easily obtained by using the same method. These previous 
states are Si, Si, S 2 , and S 3 , respectively; they are shown 
in Fig. 4 (b). 

As mentioned in Section II, one can use each decoding 
window Wi to decode the ith decoded bit. Note that the 
decoding window length L = 5A = 10 in our example. To 
decode the first decoded bit, first one uses the first decod- 
ing window Wi to construct the survivors by the Viterbi 
algorithm in Section II. Also one obtains the yi(k) for 1 < 
i < 10 . Let y. be the column vector with elements y,(&) 

where k = 0,2, 1 , 3 , i.e., y. = [y* ( 0 ) , y*( 2 ), 2/* ( 1 ) ? 2 /*( 3 )] • 
One obtains the y i for 1 < i < 10 to be y 1 — [0,0, A, A] , 
y = [0,0,0, 0f\ 2/3 = [0,0,0, Of, y 4 = [1,1,0, If, 
y = [1,1,0, If, = [0,0,0, Of, y 7 = [1,0, 1, if, 
^ = [0,1,0, Of, yg = [0,0,0, Of, and y 10 = [ 1 , 0 , 0 , Of - 
In the above, the j/ t = [ 0 , 0 ,X,Xf means that there are 
not any state changes to state Si or S 3 at time unit 1. 
Among these survivors, one chooses the state vector with 
the smallest metric of the survivors at time unit 10 to be 
X = 11 (state S 3 ). These y^.s and state vector X = 11 
are shown in Fig. 5(a). In Fig. 5(a), the elements in each 
column y. represent the associated yik for k 0,2, 1 , 3 . 
One then 4 recursively traces the survivor from state vector 
11 at time unit 10 back to time unit 1 using the method 
shown in Fig. 4(b). First, at time unit 9 , the state vec- 
tor is obtained as DMSB( 11 ) * yio( 3 ) = 1 * 0 = 10 where 
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DMSB(ll) denotes the state vector 11 without the first 
bit and the operation * denotes the concatenation opera- 
tion. Then at time unit 8, the state vector is obtained as 
DMSB(IQ) * 1/9(2) = 0*0 = 00. Also, the state vector at 
time unit 7, 6, 5, 4, 3, 2, 1 can be obtained sequentially 
as 00, 01, 10, 01, 01, 10, 00. Finally, one obtains the first 
decoded bit as Z = MSB( 00) = 0 where MSB(00) de- 
notes the first bit of 00 since the input information bit is 
the first bit of this state vector at time unit 1. 

To decode the second decoded bit, the second decod- 
ing window W 2 is needed. Thus one needs the survivor 
at time unit 11. The state vector with the smallest met- 
ric at time 11 is X = 01 and y n = [1,1,1,0] T , which 
are shown in Fig. 5(b). Using the same method above, 
one can trace state vector 01 at time unit 11 back to time 
unit 2, shown sequentially, and one obtains the state vec- 
tor X - 00 at time unit 2. Thus the second decoded bit is 
Z = MSB{ 00) = 0. 

The same procedure is used again to decode the fol- 
lowing decoded bit. This procedure stops when the time 
unit is equal to L c + 5K + 1 where L c is the length of the 
code word. 

The trace-back algorithm is summarized as follows. 
Note that because the start state is So at time unit 0, one 
assigns metric M 0 (k) — 00 for k f- 0. 

(1) Initially let M 0 ( 0) = 0 and M 0 (k) = 00 for k ^ 0. 

(2) For each k = 0, 1,2,3, find an / 6 {0, 1,2,3} such 
that Mf_i(/) + dj(l, fc) is minimum. Then 

M > (t) = M i . 1 (0 + d i (/,ib) 
yj(k) = LSB(l) 

(3) If j < 5 K, go to (2); otherwise, find an m £ 
{0,1, 2, 3} such that Mj(m) is minimum. Then 

X - m 

Also, for i = j, j - 1, . . . , j - 5 K + 1, 

X = DMSB{X)* yi (X) 

Then 

Z = MSB(X) 

(4) If j = L c + 5 AT , stop; otherwise, if j > L e , j «— 
j + 1 and go to (3); otherwise, j <— j + 1 and go 
to (2). 


MSB(k) denotes the first bit of the binary represen- 
tation of k . LSB(l) denotes the last bit of binary repre- 
sentation of /. DM SB(X) denotes the sequence of bits of 
state vector X without the first bit. L c is the length of 
the code word. Step (2) is used to compute the partial 
path metric and store the information for choosing each 
survivor. Step (3) is used to trace back the survivor to 
find the decoded bit. First, the state vector of the state 
with the smallest metric of the survivor is assigned to be X 
when j > 5 K. Then one can use this stale vector to trace 
back the survivor by the data y. . Once the trace-back pro- 
cedure is finished, the decoded bit is denoted by Z . Note 
that in step (2), one needs (bK x 2* r )-bits storage space 
to store yj(k). Also in step (3), the trace-back procedure 
takes about 5 AT cycles for each decoded bit. This method 
requires a long decoding time. 


The advantage of this systolic Viterbi decoder is that 
the trace-back operation is accomplished by processing a 
systolic array of registers in a pipeline fashion instead of 
waiting for the whole trace-back procedure. As a conse- 
quence, this systolic structure reduces the decoding time. 
However, if one traces back the survivor from time j to 
j — lj simultaneously the survivor is selected forward from 
time j to j + 1. Thus one needs extra storage space to store 
the information needed to choose the survivor path at time 
unit j. As a result, one needs about twice the amount of 
storage space to store yj(k) as that needed in the original 
trace-back algorithm. This is expressed in detail in the fol- 
lowing description of the operation of the systolic Viterbi 
decoder. 

The systolic structure of the trace-back algorithm is il- 
lustrated in Fig. 6. As shown in Fig. 6, this systolic-array 
structure consists of a selection unit and 10/i — I = 19 
path units. The selection unit processes step (2) of the 
trace-back algorithm, i.e., at time unit t , it computes the 
metrics of survivors of all nodes, selects the state vector 
with the smallest metric, and stores the information yt(k) 
for selecting each survivor. Note that this selection unit 
operates recursively. The inputs of the selection unit are 
the received sequence r t and the metrics of the previous 
survivors P t -x(l). The outputs are the metrics of the sur- 
vivors P t (l ), the information for selecting survivors, and 
the state vector m of the state with the smallest metric. 

Step (3) of the trace-back algorithm is modified and 
implemented by I OAT - 1 = 19 path units in a pipeline 
manner. Each path unit consists of 4-bit registers Yi and 
2-bit registers X{ for odd i or only a 4-bit register Yi for 
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even i. The register Y\ is used to store the elements of 
column vector y. at time unit j , and the register Yi for 
1 < i < 19 is used to store the data shifted from register 
y^_ 1 . The register X\ is used to store the state vector 
m of the state with the smallest metric, and the register 
X{ for i = 3, 5 ,..., 19 is used to store the data shifted 
from register 2; the one-bit register Z is used to store 
the decoded bit. The Register Transfer Language (RTL) 
developed in [6] is used to describe in detail the operation 
of this systolic structure as shown in Fig. 6. At time 
unit the contents in each register Yi for t = 1,2,. . . ,18 is 
transferred to the following register Yi+i and the y t (k)s for 
k = 0, 1,2,3, which are the output of selection units, are 
stored in the register Y\. If t> 5K, the contents in register 
Xi without the first bit for odd i < 17 is concatenated 
with the corresponding element in register Yi , where the 
address of this element is the contents of to generate a 
new state vector. This new state vector is then transferred 
to the following register Xi+2- Also, the state vector m 
of the state with the smallest metric is transferred to the 
register Ax, and the first bit of the contents of register Xi$ 
is stored in register Z as the decoded bit. 

Using the same example as that in Section III, one 
obtains the contents in each register at different time units 
as shown in Fig. 7. Note that at time unit 10 K — 20, one 
obtains the fust decoded information bit stored in register 
Z and then one after each time unit. One obtains the 
decoded information bits sequentially. Each odd path unit 
stores 2 k bits of data in Yi and k bits in X iy while each 
even path unit stores 2 K bits of data in YJ. 


Note also that it is possible to further reduce the num- 
ber of registers. As shown in Fig. 7 at time unit 19, the 
first bit of X\ 9 is the last bit of X\ 7 at time unit 18. 
Thus the registers Yxg, Y19, and X19 are not needed in 
this (3,1/2) Viterbi decoder, and register Z is directly con- 
nected to register Y17 to store the first bit of the contents 
of Yu. Also, the first decoded bit is decoded at time unit 
19 instead of time unit 20 . If a structure is similar to that 
used in a (3, 1/2) convolutional code, the general systolic 
Viterbi decoder can be obtained. 

It has been shown [3, 7] that one can use any state 
vector to trace back if the window length is equal to 5 K . 
This is due to the fact that all survivor paths most likely 
will merge within a window length of 5 K . Then the oper- 
ation of selecting the state vector with the smallest metric 
of the survivor in the selection unit is eliminated. As a 
result, this tracking, updating, and storing of the hypoth- 
esized information sequence can be accomplished simul- 
taneously during a single clock cycle. This systolic-array 
structure minimizes the interconnections between compo- 
nents. Hence, this new architecture is suitable for a VLSI 
implementation. 

V. Conclusion 

The realization of the systolic Viterbi decoder for con- 
volutional codes is demonstrated. This fully parallel archi- 
tecture of decoding requires a minimum amount of storage 
space and decoding time. This makes it possible to readily 
implement a Viterbi decoder with VLSI circuits. 
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Fig. 4. Survivor paths from time unit 3 to time unit 4 and the previous states 
of S 0 , S 2 , , and S 3 : (a) the assignment of Y A (k) in the trace-back algor- 

ithm; (b) the traced-back state vector for each state. 
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Fig. 5. The column vector y h the state vector Xwith the smallest metric: (a) the data of y/ in 
decoding window for decoding the first bit; (b) the data of y/ In decoding window W 2 for 
decoding the second bit. 
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Fig. 6. The structure of the systolic Viterbi decoder. 
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